KiaDev Intelligence

#spurious rewards28/05/2025

Surprising Math Reasoning Gains from Incorrect and Random Rewards in Qwen2.5-Math

Qwen2.5-Math models improve math reasoning significantly even when trained with incorrect or random reward signals, highlighting unique reinforcement learning dynamics not seen in other models.

READ →